Perceptual Significance of Cepstral Distortion Measures in Digital Speech Processing

نویسندگان

  • Antonio Vasilijević
  • Davor Petrinović
چکیده

Currently, one of the most widely used distance measures in speech and speaker recognition is the Euclidean distance between mel frequency cepstral coefficients (MFCC). MFCCs are based on filter bank algorithm whose filters are equally spaced on a perceptually motivated mel frequency scale. The value of mel cepstral vector, as well as the properties of the corresponding cepstral distance, are determined by several parameters used in mel cepstral analysis. The aim of this work is to examine compatibility of MFCC measure with human perception for different values of parameters in the analysis. By analysing mel filter bank parameters it is found that filter bank with 24 bands, 220 mels bandwidth and band overlap coefficient equal and higher than one gives optimal spectral distortion (SD) distance measures. For this kind of mel filter bank, the difference between vowels can be recognised for fulllength mel cepstral SD RMS measure higher than 0.4 0.5 dB. Further on, we will show that usage of truncated mel cepstral vector (12 coefficients) is justified for speech recognition, but may be arguable for speaker recognition. We also analysed the impact of aliasing in cepstral domain on cepstral distortion measures. The results showed high correlation of SD distances calculated from aperiodic and periodic mel cepstrum, leading to the conclusion that the impact of aliasing is generally minor. There are rare exceptions where aliasing is present, and these were also analysed.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

An Improved Speech Processing Strategy for Cochlear Implants Based on objective measures for predicting speech intelligibility

The purpose of this study was to improve the speech processing strategy for cochlear implants (CIs) A speech preprocessing algorithm is presented to improve the speech intelligibility in noise. The algorithm improves the intelligibility by optimally redistributing the speech energy over time and frequency for a perceptual distortion measure, the algorithm is more sensitive to transient regions....

متن کامل

Reducing the effects of linear channel distortion on continuous speech recognition

Linear channel compensation in speech recognition typically involves estimating an additive shift in the cepstral domain. This paper explores both Bayesian and maximum likelihood techniques to transform either the features or the model parameters. Experiments on the Macrophone corpus show error rate reductions over cepstral mean subtraction for short utterances.

متن کامل

Robust distant speech recognition based on position dependent CMN using a novel multiple microphone processing technique

In a distant environment, channel distortion may drastically degrade speech recognition performances. In this paper, we propose a robust multiple microphone speech processing approach based on position dependent Cepstral Mean Normalization (CMN). In the training stage, the system measures the transmission characteristics according to the speaker positions from some grid points in the room and e...

متن کامل

Application of speech conversion to alaryngeal speech enhancement

Two existing speech conversion algorithms were modified and used to enhance alaryngeal speech. The modifications were aimed at reducing spectral distortion (bandwidth increase) in a vector-quantization (VQ) based system and the spectral discontinuity in a linear multivariate regression (LMR) based system. Spectral distortion was compensated for by formant enhancement using chirp z-transform and...

متن کامل

On the use of bandpass liftering in speech recognition

Alstract-In a template-based speech recognition system, distortion measures that compute the distance or dissimilarity between two spectral representations have a strong influence on the performance of the recognizer. Accordingly, extensive comparative studies have been conducted to determine good distortion measures for improved recognition accuracy. Previous studies have shown that the log li...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2011